Converting Data to a Useful Form

Data is seldom in a form ready for use in a simulation model. Usually, some analysis and conversion needs to be performed for data to be useful as an input parameter to the simulation. Random phenomena must be fitted to some standard, theoretical distribution such as a normal or exponential distribution (Law and Kelton, 1991), or be input as a frequency distribution. Activities may need to be grouped together to simplify the description of the system operation.

Distribution Fitting To define a distribution using a theoretical distribution requires that the data, if available, be fit to an appropriate distribution that best describes the variable. ProModel includes the Stat::Fit distribution fitting package which assists in fitting sample data to a suitable theoretical distribution. An alternative to using a standard theoretical distribution is to summarize the data in the form of a frequency distribution that can be used directly in the model. A frequency distribution is sometimes referred to as an empirical or user-defined distribution.

Whether fitting data to a theoretical distribution, or using an empirical distribution, it is often useful to organize the data into a frequency distribution table. Defining a frequency distribution is done by grouping the data into intervals and stating the frequency of occurrence for each particular interval. To illustrate how this is done, the following frequency table tabulates the number and frequency of observations for a particular activity requiring a certain range of time to perform.

Frequency Distributions of Delivery Times

Delivery Time (days)

Number of Observations

Percentage

Cumulative Percentage

0 - 1

25

16.5

16.5

1 - 2

33

21.7

38.2

2 - 3

30

19.7

57.9

3 - 4

22

14.5

72.4

4 - 5

14

9.2

81.6

5 - 6

10

6.6

88.2

6 - 7

7

4.6

92.8

7 - 8

5

3.3

96.1

8 - 9

4

2.6

98.7

9 - 10

2

1.3

100.0

Total Number of Observations = 152

While there are rules that have been proposed for determining the interval or cell size, the best approach is to make sure that enough cells are defined to show a gradual transition in values, yet not so many cells that groupings become obscured.

Note in the last column of the frequency table that the percentage for each interval may be expressed optionally as a cumulative percentage. This helps verify that all 100% of the possibilities are included.

When gathering samples from a static population, one can apply descriptive statistics and draw reasonable inferences about the population. When gathering data from a dynamic and possibly time varying system, however, one must be sensitive to trends, patterns, and cycles that may occur with time. The samples drawn may not actually be homogenous samples and, therefore, unsuitable for applying simple descriptive techniques.

Activity Grouping Another consideration in converting data to a useful form is the way in which activities are grouped for modeling purposes. Often it is helpful to group activities together so long as important detail is not sacrificed. This makes models easier to define and more manageable to analyze. In grouping multiple activities into a single activity time for simplification, consideration needs to be given as to whether activities are performed in parallel or in series. If activities are done in parallel or with any overlap, the time during which overlapping occurs should not be additive.

Serial activities are always additive. For example, if a series of activities is performed on an entity at a location, rather than specifying the time for each activity, it may be possible to sum activity times and enter a single time or time distribution.